79 research outputs found
Gauge Invariant Framework for Shape Analysis of Surfaces
This paper describes a novel framework for computing geodesic paths in shape
spaces of spherical surfaces under an elastic Riemannian metric. The novelty
lies in defining this Riemannian metric directly on the quotient (shape) space,
rather than inheriting it from pre-shape space, and using it to formulate a
path energy that measures only the normal components of velocities along the
path. In other words, this paper defines and solves for geodesics directly on
the shape space and avoids complications resulting from the quotient operation.
This comprehensive framework is invariant to arbitrary parameterizations of
surfaces along paths, a phenomenon termed as gauge invariance. Additionally,
this paper makes a link between different elastic metrics used in the computer
science literature on one hand, and the mathematical literature on the other
hand, and provides a geometrical interpretation of the terms involved. Examples
using real and simulated 3D objects are provided to help illustrate the main
ideas.Comment: 15 pages, 11 Figures, to appear in IEEE Transactions on Pattern
Analysis and Machine Intelligence in a better resolutio
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition
The Transformer architecture has gained significant popularity in computer
vision tasks due to its capacity to generalize and capture long-range
dependencies. This characteristic makes it well-suited for generating
spatiotemporal tokens from videos. On the other hand, convolutions serve as the
fundamental backbone for processing images and videos, as they efficiently
aggregate information within small local neighborhoods to create spatial tokens
that describe the spatial dimension of a video. While both CNN-based
architectures and pure transformer architectures are extensively studied and
utilized by researchers, the effective combination of these two backbones has
not received comparable attention in the field of activity recognition. In this
research, we propose a novel approach that leverages the strengths of both CNNs
and Transformers in an hybrid architecture for performing activity recognition
using RGB videos. Specifically, we suggest employing a CNN network to enhance
the video representation by generating a 128-channel video that effectively
separates the human performing the activity from the background. Subsequently,
the output of the CNN module is fed into a transformer to extract
spatiotemporal tokens, which are then used for classification purposes. Our
architecture has achieved new SOTA results with 90.05 \%, 99.6\%, and 95.09\%
on HMDB51, UCF101, and ETRI-Activity3D respectively
A DYNAMIC GEOMETRY-BASED APPROACH FOR 4D FACIAL EXPRESSIONS RECOGNITION
International audienceIn this paper we present a fully automatic approach for identity-independent facial expression recognition from 3D video sequences. Towards that goal, we propose a novel approach to extract a scalar field that represents the defor- mations between faces conveying different expressions. We extract relevant features from this deformation field using LDA and then train a dynamic model on these features using HMM. Experiments conducted on BU-4DFE dataset fol- lowing state-of-the-art settings show the effectiveness of the proposed approach
3D Dynamic Expression Recognition Based on a Novel Deformation Vector Field and Random Forest
International audienceThis paper proposes a new method for facial motion extraction to represent, learn and recognize observed expressions, from 4D video sequences. The approach called Deformation Vector Field (DVF) is based on Riemannian facial shape analysis and captures densely dynamic information from the entire face. The resulting temporal vector field is used to build the feature vector for expression recognition from 3D dynamic faces. By applying LDA-based feature space transformation for dimensionality reduction which is followed by a Multiclass Random Forest learning algorithm, the proposed approach achieved 93% average recognition rate on BU-4DFE database and outperforms state-of-art approaches
Enhancing Gender Classification by Combining 3D and 2D Face Modalities
International audienceShape and texture provide different modalities in face-based gender classification. Although extensive works have been reported in the literature, the majority of them are in the scope of shape or texture modality individually. Among them, only a few concern their combination, and to the best of our knowledge, no work considers the combination with the 3D face surface. In our work, we investigate the combination of shape and texture modalities for gender classification, with both the combination of range images and gray images, and the combination of 3D meshes and gray images. In 10-fold subject-independent cross-validation with Random Forest on the FRGC-2.0 dataset, we achieved a correct gender classification rate of 93.27%± 5.16, which outperforms each individual modality and is comparable to the state-of-the-art. Results confirm that shape and texture modalities are complementary, and their combination enhances the performance of face-based gender classification
Fusion d'Experts pour une Biométrie Faciale 3D Robuste aux Déformations
Session "Posters"National audienceNous étudions dans cet article l'apport de la géométrie tridimensionnelle du visage dans la reconnaissance des individus. La principale contribution est d'associer plusieurs experts (matcheurs) de biométrie faciale 3D afin d'achever de meilleures performances comparées aux performances individuelles de chacun, notamment en présence d'expressions. Les experts utilisés sont : (E1) Courbes radiales élastiques, (E2) MS-eLBP, une version étendue multi-échelle de l'opérateur LBP, (E3) l'algorithme de recalage non-rigide TPS, en plus d'un expert de référence (Eref) l'algorithme de recalage rigide connu ICP. Profitant de la complémentarité de chacun des experts, la présente approche affiche un taux d'identification qui dépasse les 99% en présence d'expressions faciales sur la base FRGCv2. Une étude comparative avec l'état de l'art confirme le choix et l'intérêt de combiner plusieurs experts afin d'achever de meilleurs performance
Calcul statistique sur les variétés de forme pour la l'analyse et la reconnaissance de visage 3D
We propose, in this thesis, a unified Riemannian framework for comparing, deforming, averaging and hierarchically organizing facial surfaces. This framework is applied within the 3D face recognition problem where facial expressions, pose variations, and occlusions are the main challenges of this topic. The facial surfaces are represented by collections of level curves and radial ones. The set of closed curves (level curves) constitute an infinite dimensional sub-manifold and is used to represent the nasal region, the most stable part of the face. The facial surface is represented by an indexed collection of radial curves. In this case, the calculus is simpler and the space of open curves shape is simply the hypersphere of Hilbert space. The comparison in this shape space is done via an "elastic" metric in order to handle non-isometric deformations of facial surfaces. We propose algorithms for computing means and eigenvectors in these nonlinear manifolds and hence algorithms for estimation of missing parts of 3D facial surfaces. Comparison with competitor approaches using a common experimental setting on the FRGCv2, GAVAB, BOSPHORUS databases, shows that our solution is able to obtain, and outperform in some scenarios, the state-of-the-art results.Dans cette thèse, nous proposons un cadre Riemannien pour comparer, déformer, calculer des statistiques et organiser de manière hiérarchique des surfaces faciales. Nous appliquons ce cadre à la biométrie faciale 3D où les défis sont les expressions faciales, les variations de la pose et les occultations du visage par des objets externes. Les surfaces faciales sont repr'esentées par un ensemble de courbes de niveaux et de courbes radiales. L'ensemble des courbes fermées (de niveau) constitue une sous-variété non-linéaire de dimension infinie et est utilisé pour représenter le nez, la partie la plus stable du visage. La surface faciale est présentée, par ailleurs, par une collection indexée de courbes radiales. Dans ce cas, le calcul se simplifie et l'espace des formes des courbes ouvertes se ramène à une hyper sphère de l'espace de Hilbert. La comparaison dans l'espace des formes se fait via une métrique élastique afin de faire face aux d'eformations non-isométriques (ne conservant pas les longueurs) des surfaces faciales. Nous proposons des algorithmes pour calculer les moyennes, les vecteurs propres dans ces variétés non-linéaires et l'estimation des parties manquantes des surfaces faciales 3D. L'approche présentée dans cette thèse a été validée sur des Benchmarks connus (FRGCv2, GAVAB, BOSPHORUS) et obtenu des résultats compétitifs par rapport aux méthodes de l'état de l'art
- …